Goto

Collaborating Authors

 linear function


Linear regression without correspondence

Neural Information Processing Systems

This article considers algorithmic and statistical aspects of linear regression when the correspondence between the covariates and the responses is unknown. First, a fully polynomial-time approximation scheme is given for the natural least squares optimization problem in any constant dimension. Next, in an average-case and noise-free setting where the responses exactly correspond to a linear function of i.i.d.


Estimating Learnability in the Sublinear Data Regime

Neural Information Processing Systems

We consider the problem of estimating how well a model class is capable of fitting a distribution of labeled data. We show that it is often possible to accurately estimate this ``learnability'' even when given an amount of data that is too small to reliably learn any accurate model. Our first result applies to the setting where the data is drawn from a $d$-dimensional distribution with isotropic covariance, and the label of each datapoint is an arbitrary noisy function of the datapoint. In this setting, we show that with $O(\sqrt{d})$ samples, one can accurately estimate the fraction of the variance of the label that can be explained via the best linear function of the data. We extend these techniques to a binary classification, and show that the prediction error of the best linear classifier can be accurately estimated given $O(\sqrt{d})$ labeled samples. For comparison, in both the linear regression and binary classification settings, even if there is no noise in the labels, a sample size linear in the dimension, $d$, is required to \emph{learn} any function correlated with the underlying model. We further extend our estimation approach to the setting where the data distribution has an (unknown) arbitrary covariance matrix, allowing these techniques to be applied to settings where the model class consists of a linear function applied to a nonlinear embedding of the data. We demonstrate the practical viability of our approaches on synthetic and real data. This ability to estimate the explanatory value of a set of features (or dataset), even in the regime in which there is too little data to realize that explanatory value, may be relevant to the scientific and industrial settings for which data collection is expensive and there are many potentially relevant feature sets that could be collected.




kcur kcurX i=1

Neural Information Processing Systems

Out of the box, these models take as input a sequence of vectors in embedding space and output asequence ofvectors inthe same space. We treat the prediction of the model at the position corresponding toxi (that is absolute position 2i 1)asthepredictionof f(xi). A.2 Training Each training prompt is produced by sampling a random functionf from the function class we are training on, then sampling inputsxi from the isotropic Gaussian distributionN(0,Id) and constructing apromptas(x1,f(x1),...,xk,f(xk)). For the class of decision trees, the random functionf is represented by a decision tree of depth4 (with16leafnodes),with20dimensionalinputs. Minimum norm least squares is the optimal estimator for the linear regression problem.






a59a11e8580a7ac850cb792f6179c7a0-Supplemental-Conference.pdf

Neural Information Processing Systems

The task is to i) predict the unknown parameters, then ii) solve the optimization problem using the predicted parameters, such that the resulting solutions are good even under true parameters.